Partition Refinement of Component Interaction Automata: 
Why Structure Matters More Than Size 



Markus Lumpe and Rajesh Vasa 



Faculty of Information & Communication Technologies 
Swinburne University of Technology 
Hawthorn, Australia 



{mlumpe ,rvasa}@swin. edu.au 



Automata-based modeling languages, like Component Interaction Automata, offer an attractive means 
to capture and analyze the behavioral aspects of interacting components. At the center of these mod- 
eling languages we find finite state machines that allow for a fine-grained description how and when 
specific service requests may interact with other components or the environment. Unfortunately, 
automata-based approaches suffer from exponential state explosion, a major obstacle to the suc- 
cessful application of these formalisms in modeling real-world scenarios. In order to cope with the 
complexity of individual specifications we can apply partition refinement, an abstraction technique 
to alleviate the state explosion problem. But this technique too exhibits exponential time and space 
complexity and, worse, does not offer any guarantees for success. To better understand as to why 
partition refinement succeeds in some cases while it fails in others, we conducted an empirical study 
on the performance of a partition refinement algorithm for Component Interaction Automata specifi- 
cations. As a result we have identified suitable predictors for the expected effectiveness of partition 
refinement. It is the structure, not the size, of a specification that weighs heavier on the outcome 
of partition refinement. In particular, Component Interaction Automata specifications for real-world 
systems are capable of producing scale-free networks containing structural artifacts that can assist 
the partition refinement algorithm not only converge earlier, but also yield a significant state space 
reduction on occasion. 

1 Introduction 

Component Interaction Automata E [H offers a well-balanced formal modeling framework to cap- 
ture both the temporal and the hierarchical aspects of cooperating components in modern real-world 
component-oriented software systems. The Component Interaction Automata formalism provides two 
component-oriented software development processes: the architectural description of the system be- 
ing developed and the formal verification of the intrinsic properties of the system under consideration 
iPTl . The Component Interaction Automata modeling language builds on I/O Automata IPT71 . Interface 
Automata [2], and Team Automata H that all employ an automata-based language to represent the as- 
sumptions about a system's capabilities to interact with the environment or other components. However, 
unlike its predecessors, Component Interaction Automata distinguishes between components and compo- 
nent instances [14J. This embodies a crucial difference that makes the Component Interaction Automata 
approach more suitable for the specification of real- world component-oriented systems lfT5Tl . 

Unfortunately, automata-based modeling approaches suffer from combinatorial state space explosion 
with respect to the size of the modeled system. When defining the composition of components, we need 
to construct the product automaton [12] of the system being specified. Even though not all states in the 
product automaton may be reachable (i.e., they can be removed from the system), composite component 

Camara, Canal, and Salaiin (Eds.) 

Component and Service Interoperability (WCSI10) © M. Lumpe and R. Vasa 




M. Lumpe and R. Vasa 



13 



interaction automata will eventually grow to a size where an effective analysis of the system properties 
may not be feasible |[T6l . 

It is for this reason that we have been studying suitable abstraction mechanisms in order to distill 
smaller, yet behaviorally equivalent, specifications for a given component interaction automaton. In 
particular, we have developed a bisimulation-based partition refinement algorithm for Component In- 
teraction Automata lfl6l [T31 . Partition refinement iTTTTl constructs, if possible, a new image of a given 
automaton, where the states of the new automaton correspond to the equivalence classes of the old au- 
tomaton. The granularity of the refinement process depends on the underlying equivalence relation being 
used. For Component Interaction Automata we use weak bisimluation, a behavioral equivalence relation 
that abstracts from internal component synchronizations. In other words, partition refinement for Com- 
ponent Interaction Automata equates both behavioral equivalent substructures of an automaton and states 
that are solely connected by internal component synchronizations |[T6l[T5l . 



C620C915: ((C620), (C915)) 




(a) Composite automaton (b) Reduced automaton 



Figure 1: Component interaction automaton C620C915 and its reduced variant C620C915'. 

Consider, for example, the graphical representation of the automata C620C915 and C620915' shown 
in Figure [T] Automaton C620C915 is drawn from a sample of experimental components interaction 
automata specifications used in our study. The states si, s3, sA, and s6 in C620C915 are weakly -bisimilar 
and belong, therefore, to the same equivalence class denoted by state rl in automaton C620C915' (cf. 



Figure 1(b) I. This small example illustrates two specific properties of Component Interaction Automata 
and partition refinement. First, partition refinement through weak bisimulation does not remove all 
internal component synchronizations. Both (C620,a6,C915) and (C620,a3,C915) have to remain in 
C620C915' as their respective target states offer different interaction capabilities. Second, the states 
si, s3, sA, and s6 in C620C915 form a community or synchronization clique |15] that gives rise to 
a significant reduction. Here, the reduction involves a terminal state, but if C620C915' were to occur 
within a larger system, this particular effect would enable the partition refinement algorithm to yield a 
better reduction ratio. 

The presence of synchronization cliques in an automaton is of particular significance for the under- 
standing of the performance of the partition refinement algorithm, as the refinement process itself is not 
guaranteed to succeed. It order to identify the reasons as to why partition refinement can sometimes 
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yield strong state space reduction ratios [16], while it fails completely on other occasions, we have run 
an analysis on a sample of 1,680 experimental composite systems. Each system consists of between 
2 and 1 1 machine-generated Component Interaction Automata specifications that all enjoy topological 
properties similar to real-world software systems Il27ll26l . Every experiment was allowed to run at most 
two hours and was carried out on a Mac Pro equipped with one 2.66 GHz Quad-Core processor and 8GB 
1066 MHz DDR3 memory running Mac OS X 10.6.3. The results were analyzed using logistic regres- 
sion Hl|6]], a statistical method for the prediction of the probability of the occurrence of a specific event. 
In particular, we wanted to determine which features of a Component Interaction Automata specification 
can serve as explanatory variables or predictors for a specific expected outcome of running partition 
refinement on a given component interaction automaton. By identifying suitable explanatory variables 
we can construct a model that explains how and when partition refinement for a Component Interaction 
Automata specification is to succeed or fail. 

The rest of the paper is organized as follows: in Section [2] we briefly review the Component Interac- 
tion Automata formalism and present a corresponding partition refinement algorithm. We proceed with 
an analysis of the structural properties of Component Interaction Automata specifications in Section [3] 
In particular, we study selected graph properties and highlight how they affect the possible outcome of 
partition refinement. Section [4] presents the results of our logistic regression analysis. In particular, we 
discuss four models using maximum likelihood estimation (MLE) and demonstrate that structure, not 
size, provides good estimates for the success of partition refinement. We conclude with a summary of 
main observations in Section [5] 

2 Partition Refinement for Component Interaction Automata 

The Component Interaction Automata formalism aims at the specification and verification of component- 
based software systems at an interface level 0[H. It is interfaces that allow us to define a suitable decom- 
position of a system into its logical units, the components, and that capture the components' interactive 
behavior in a concise way. Collectively, interfaces and the information they relay form a contractual 
specification Q that explicitly states all assumptions about a component's (or system's) deployment 
environment. Using the Component Interaction Automata formalism we can reason about contractual 
specifications in at least two ways: "Does the system respond to service requests in the expected order? " 
and "What is a behaviorally equivalent specification for a given component or system?" 

Definition 1 (Component Interaction Automata) A component interaction automaton ^ is a quintu- 
ple (Q,Act,8,I,H) where: 

• Q is a finite set of states, 

• Act is a finite set of actions, 

• 8 CQxLxQis a finite set of labeled transitions, where L C { (S(H) U { — } x Act x S(H) U { — } ) } \ 
{({— } x Act x {— })} is the set of structured labels induced by c €, 

• / C Q is a non empty set of initial states, and 

• H is a hierarchical composition structure with either 

• H = (C\ ,...,C n ) denoting a primitive composition of the component instances C\ ,...,C„, such 
thatS{H) = U? =1 {q}, or 

• H = (Hi, ...,H m ), where Hi,...,H m are hierarchies of component instances satisfying the 
structural property V 1 < i, j < m, i / j : S(Hi) n S(Hj) = 0, such that S(H) = U™ =1 5(// ! ). 
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Each component interaction automaton is further characterized by two sets P C Act, the provided ac- 
tions, and R C Act, the required actions. These sets capture the automaton 's enabled interface with an 
environment. We write to denote an automaton that is input-enabled in P and output-enabled in 



The composition of component interaction automata is defined in the usual way. The behavior of the 
composite system is the cross-product of its component behaviors. We apply the architectural constraints 
P and R, the set of provided services and the set of required services, respectively, to control, which 
transitions can occur in the composite automaton. In general, we use P and R to contain only those 
actions that appear in input and output transitions of the composite automaton. 

Definition 2 (Component Interaction Automata Composition) Let = {(g,,Acf,, 8i,li,Hi)} ie i be 
a system of pairwise disjoint component interaction automata and P,R are the provided and required 
actions. Then = (YliQj,UiActj,8yp,YljIi, (//;);) is the composite component interaction automaton 
ofS^g where qj denotes a function fJi Qi — > Qj, the projection from product state q to the j th component's 
state q, and 



Sinput = {(q,(-,a,n),q) \ a G R A 3/ : {q u (-,a,n),q'J G 8 t A Vj A j^i: qj = q'j}, 
<>Output = {(q,(n,a,-),q')\aeP A3i:(qu(n,a,-),q' i )e8 i A\/j A j^i:qj = q'j}. □ 

Composition in Component Interaction Automata is defined over an arbitrary number of components. 
The behaviors of the individual components are simultaneously recombined to yield the composite be- 
havior. This flexibility comes, however, at a price. The effects of the exponential combinatorial time 
and space explosion appear rather quickly |[T6l and the resources required to build a product automaton 
exceed practical limits. For this reason we compose only two automata at a time and apply partition 
refinement to the result immediately in our experiments. This approach remains faithful to the underly- 
ing specification, but it provides us with a scenario in which we can think of partition refinement as an 
"on-the-fly" technique. 

In order to apply partition refine to a Component Interaction Automata specification we need to define 
a suitable equivalence relation. We use bisimulation, in particular weak bisimulation, for this purpose. 
Weak bisimulation provides us with an equivalence relation that equates automata that only differ in the 
lengths of occurring internal component synchronization sequences lISl fTBI . 

Definition 3 (Weak Bisimulation for Component Interaction Automata) Given two component inter- 
action automata A = (QA,ActA,8A,lA,H) and B = (QB,Acts, 8b, H) with an identical composition hier- 
archy H, a binary relation M C Q x Q with Q = Qa^ Qb is a weak bisimulation, if it is symmetric and 
(q, p) G & implies, for all I GZ, £ = U £g being the set of structured labels induced by A and B, 

• whenever q — > q', then 3p' such that p =4> p' and (q',p') G £%. 

Two component interaction automata A and B are weakly bisimilar, written A ~ B, if they are related by 
some weak bisimulation. □ 



R. 



□ 



8yP — SoidSync U dNewSync U 8j nput U Soutput 



with 




{(q,(ni,a,n 2 ),q') \ 3/: (qi,(ni, a,n 2 ),fy G 5, A VjeJfJ^i: q j = q'.} ] 
{(q,(n\,a,n 2 ),q) \ 3/i,z 2 A i\ / i 2 : (tf;,, (nx,a,-),q' h ) G 8 h A 

(q h ,(-,a,n 2 ),q' h ) G 8 h A Vj A h / j ^ h ■ qj = q'j], 
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We write p =^ p' to denote that an automaton, = (Q,Act,8,I,H), can evolve from state p to p' 
through an interaction / with a possibly empty sequence of internal component synchronizations occur- 
ring before and after transition /. The relation p =^=> p' gives rise to a splitter function that provides 
us with a means to compute the equivalence classes up to weak bisimulation for a given automaton 'if. 
The splitter function is a Boolean predicate yigxExSn- {true, false}, where § C 2^ is a set of 
candidate equivalence classes for ^ . Let q be a state, P G § be candidate equivalence class, and / be a 
structured label for a component interaction automaton ^ to be refined. Then the corresponding splitter 
is 

y{q I P) < true if there is p £ P such that q ==^ p , ^ 

I false otherwise 

Partition refinement is a function Sx£x§i->S that takes three arguments: G §, the partition 
resulting from step i — 1, / G E, the splitter label, and G §, the equivalence classes in step i. 

refine(Xi- h l,Pi) := U Xe xi_i(U v6 {tme ) iaiBe}{? I V? € X. y{q,l, Pi) = v}) - {0} (2) 

Our partition refinement algorithm differs in two aspects compared with the one proposed by Her- 
manns ifTTl . First, we add an iteration over the labels. Experiments have shown that the partition re- 
finement algorithm will require fewer splitters if we add this extra iteration. Furthermore, we split X in 
two sets: X , the set of singleton partitions, andX >l , the set of partitions comprising two or more states. 
Singleton partitions cannot be further refined and, therefore, we do not need to test them again. The 
partition refinement algorithm has to test only X >1 : 

<r- {£>}; X}_ x <- 0; Repeat <- true; 
while Repeat 
do for Z G E 

do EqvClasses <r- X^\ U^il Repeat false; 
while EqvClasses / 

do choose Pi G EqvClasses; 
(X?\xl)^refine{X>\,l : P$; 

ifX>V*>i 

then X>_\ <- X> 1 ; X}_ l <- X}_ l U X> ; 
EqvClasses <- X>\ UX^; 
Repeat <— true; 
else EqvClasses <— EqvClasses —{Pi}; 
return Xr^UX^; 

The above algorithm yields a partition that is minimal up to weak bisimulation (i.e., the fixed-point) 
with respect to the number of required states for a given automaton % '. This algorithm is part of our ex- 
perimental composition framework for Component Interaction Automata, implemented in PLT-Scheme, 
that provides support not only for the specification and refinement of component interaction automata, 
but also for the extraction of metrics data [ 16l[T31l. 



3 Structural Analysis 

The state explosion problem in Component Interaction Automata is intrinsic to all algebraic software 
modeling techniques that seek to express properties of the modeled system through an automata-based 
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approach. The actual specifications yield directed graphs in which vertices are the states of the systems 
and edges denote possible interactions with the environment or other components. Graph theory (9j [191 
provides a rich source for a meaningful interpretation of properties of Component Interaction Automata 
specifications. However, even though partition refinement explores the communication structure of an 
automaton, partition refinement itself does not actually exploit the topology of the automaton's graph to 
fine-tune the refinement process. 



C620: (C620) 
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(a) Experimental automaton C620 



(b) Experimental automaton C9 15 



Figure 2: Emerging preferential attachment in Component Interaction Automata. 

Consider the two automata, C620 and C915, shown in Figure [2j These are two machine-generated 
specifications to simulate real-world software systems. Both automata exhibit some typical graph prop- 
erties that we find in real- world software systems. First, software is not made of "Lego blocks" GUI . 
The topology of both automata varies greatly. The distribution of transitions in different automata does 
not follow a uniform pattern. Some states attract more transitions than others. There is no unique size in 
terms of number of states and number of transitions. Nevertheless, the ratio between both quantities as- 
sumes some common value, a feature that becomes even more pronounced when we compose automata 
specifications and refine the resulting composite. 

Second, the automata C620 and C915 exhibit preferential attachment Q. States that are already 
well connected attract new transitions more easily than others. This "rich-get-richer" strategy is typical 
for software systems ll25l . The distribution of functionality in a software system is neither regular nor 
random. Developers prefer to organize and maintain software systems around a small number of highly 
complex abstractions ll25l . These abstractions constitute virtual hubs in the systems and appear to guar- 
antee not only the proper function of a software, but also the ability to evolve a software system in order 
to meet changing requirements in the future [25 ]. 

We can measure these structural features using two concepts: the scaling exponent /3 E71 |26H to 
denote the power-scaling relationship between the number of states and the number of transitions of an 
automaton and the Gini coefficients lTT0ll25l of the incoming and outgoing transitions in an automaton in 
order to quantify the degree of inequality in the distribution of these attributes in a given automaton. Both 
measures provide suitable summary metrics of the underlying directed graph topology of an automaton. 
Moreover, these measures can also serve as reliable predictors for the success of partition refinement. 
Hence, the better we understand the topological properties of a given automaton the more we can guide 
the partition refinement process, if possible, to yield a significant state space reduction. 

The power-scaling relationship for a component interaction automaton c € = (Q,Act,8,I,H) between 
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the number of states and the number of transitions is given by 

|5| ~ \Q\ P (3) 

where the scaling exponent j3 is the ratio between the natural logarithm of the number of states and the 
number of transitions in automaton c €: 

p = MM (4) 

The actual value of the scaling exponent /3 for our sample of 1,680 machine-generated automata 
specifications satisfies the probability density function 

,* 

P[a < X < b] = / f(x) dx, with a = 0.63 and b = 2 (5) 

J a 

In other words, the scaling exponent /3 is at its minimum, when 

|5| = lfil-1 (6) 

and at its maximum, when 

\Q\ = vW\ (7) 

The values of /3 show remarkable similaiity to those found in real- world software systems Il24ll27l . 
The observed frequency distribution of j8 for our experimental data set is shown in Figure 3(a) The dis- 
tribution of j6 approximates a normal distribution with a mean value Hp = 1.36 and a standard deviation 
Or =0.19. 




Figure 3: The frequency distribution and evolution of the scaling exponent /3. 

As a system matures, with respect to a growing number of states, the scaling exponent /3 converges 
towards the mean (cf. Figure 3(b) I. Our experiments also confirm an observation made by Vasa et al. that 
the scaling exponents for real- world software systems plateaux at a system-specific value as the software 
systems mature ll27l . 

The Gini coefficient is a well-established measure to quantify the inequality of income distributions 
in moderns societies |[23l that we have previously applied in the analysis of evolving software systems 
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|25l . The Gini coefficient is a number between and 1, where denotes a perfect equality {e.g., every 
state in the system possesses the same number of outgoing transitions) and 1 signifies a perfect inequality 
{e.g., all states except one have no incoming transitions). The Gini coefficient is an entropic inequality 
measure. If its value is closer to 1 , then centralization of behavior in the system is greater with fewer 
states contributing to the information entropy GTH of the automaton. In other words, there are "hubs" 
present in the automaton that centralize behavioral options and, therefore, yield a higher level of abstrac- 
tion. With respect to partition refinement this means that the algorithm is more likely to succeed for an 
automaton that contains structural artifacts with Gini coefficients closer to 1 . 

For a population with values Xi, 1 < i < n, that are indexed in non-decreasing order (x, < x !+ i), the 
Gini coefficient is 



G 



1U; 



n!P i=l Xi 



(8) 



We use Gin and Gout to denote the Gini coefficient of incoming transitions and outgoing transitions, 
respectively. Observed value ranges of the Gini coefficients for our experimental automata are shown in 
Figure [4j The Gini coefficients of incoming transitions Gin (cf . Figure 4(a) ) follow closely, though not 



perfectly, a normal distribution with mean value p.c 1N = 0.34 and a standard deviation Og in = 0. 1 1. In or- 
der words, the distribution of incoming transitions in an automaton appears to be more likely independent 
of the behavior being modeled by the automaton. 





Gini coefficients of incoming transitions 



Gini coefficients of outgoing transitions 



(a) Frequency distribution of Gin 



(b) Frequency distribution of Gqut 



Figure 4: The frequency distribution of Gini coefficients for incoming and outgoing transitions. 



In contrast, the values of the Gini coefficients for outgoing transitions Gout (cf - Figure 4(b) ) deviate 



significantly from a normal distribution. Even though the mean value }Xg out = 0.38 and the standard de- 
viation Ogout =0.15 are not very different from their respective Gin values, the values of Gout exhibit 
a distinct positive skew with a fat tail of higher Gini coefficients. Skewed distributions emerge when 
parameters have multiplicative effects lTT3ll and certain factors prevail more than others. One such factor 
is preferential attachment [ 3 ] that favors behaviorally-rich states. However, there is another aspect to the 
concentration of outgoing transitions that is born of the partition refinement process itself. Consider Fig- 
ure 5(a) which depicts a fragment of the behavior of automaton C260C44. At the center of this fragment 
we find a set of states (marked in bold blue) that form a "synchronization clique'^ [15]. A synchroniza- 
tion clique appears when states that are solely connected by internal component synchronizations become 



We have omitted the labels to enhance readability. 
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joined in one equivalence class by partition refinement. This is very typical for Component Interaction 
Automata specifications and gives rise to significant state space reduction ratios lfl5l . Figure 5(b) illus- 
trates the corresponding effect of partition refinement. Where there were five states before, we find just 
one now that concentrates on itself a large number of outgoing transitions. In order words, we witness 
an "emergent hub" in the system that acts as a focal point for previously disjoint behavioral choices. 
The fragment in Figure 5(a) has a Gini coefficient for outgoing transitions of Gout = 0.67, whereas the 
fragment in Figure 5(b) has Gout = 0.89. A higher Gout is more likely to enable successful partition 
refinement than a smaller one. 



4 Regression Analysis 

Do the size and the structure of a Component Interaction Automata specification influence the probability 
for success of partition refinement? In order to answer this question we conducted a number of regression 
analyses fl] El to determine whether an automaton's size, structure, or both impact the actual outcome 
of partition refinement and, if so, how. 

In statistics, regression analysis provides a means to study possible relationships between variables. 
Regression analysis involves constructing models with one or more explanatory variables, Xj, 1 < i < n, 
and a response variable Y . For the analysis of partition refinement of Component Interaction Automata 
specifications we use a special form of analysis, called logistic regression (H|6), that is applicable when 
the response variable is a dichotomy. A logistic model can be described formally as follows. Let n{x) = 
Pr(Y = 1 \X = x) = 1 — Pr(Y = 0,X = x) be the hypothesized proportion of an expected value x within a 
population %. The corresponding logistic regression model CQ is 

e (a+bx) 

*W = l +g (« + fa) (9) 

where a is called the intercept and b is called the regression coefficient. 

The response variable Y is categorical, where 1 represents "success" and denotes "failure." We 
define two questions and store the corresponding answer in the associated response variables for our 
analysis: "Does partition refinement succeed?" and "Does partition refinement require more than 5 
minutes to converge?" 

The explanatory variable X can be either numerical or categorical. For the analysis of partition 
refinement we use numerical variables. In particular, for a given automaton = (Q,Act,8,I,H) we 
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construct four different logistic models with the explanatory variable X being either \Q\, the size of the 
automaton in terms of number of states, /?, the power-scaling relationship between the number of states 
and the number of transitions, Gin, the concentration of incoming transitions, or Gout, the concentration 
of outgoing transitions. 

4.1 Experiment Setup and Analysis Approach 

We selected two random samples of Component Interaction Automata specifications from a pool of 
5,849 machine-generated candidates. The first set contained 840 automata for which partition refinement 
succeeded, whereas the second set, of equal size, comprised automata where partition refinement failed. 
In order to guarantee a meaningful comparison, we ensured that the automata in both sample sets have 
similar properties. The average size in terms of number of states is approximately 46, with 2 being the 
minimum and 892 being the maximum number of states in an automaton for both sets. Similarly, the 
reduction time for both sets of automata spans a range from 6 milliseconds to 2 hours. 

We applied Maximum Likelihood Estimation (MLE) to construct our logistic models and to determine 
the corresponding regression parameters a and b. To verify that there exists a dependency between the 
explanatory variable X and the response variable Y, we checked for each constructed model whether the 
likelihood ratio % 2 is significant at 1 degree of freedom (df.) within a 95% confidence interval (cf.). If 
a logistic model shows a statistically significant relationship, then we use it to derive the probability of 
success for the full range of values of the explanatory variable and analyze how the probability of success 
changes with the value of the explanatory variable. 

In addition, to qualify the strength of the relationship between the explanatory variable and the re- 
sponse variable, we also compute two further measures: sensitivity and specificity. The former captures 
the probability of detecting a success of partition refinement when an actual reduction has occurred. The 
latter reflects the probability of detecting failure when partition refinement has indeed failed. Sensitivity 
and specificity provide guarantees that a constructed regression model is effective at detecting equally 
well both success and failure and is, in fact, better than pure random guessing JT]. 

Ideally, the values for sensitivity and specificity should be as close to 100% as possible. In this case 
the model will correctly classify all successes and failures using just the explanatory variable. However, 
in practice models are never perfect. To further improve on the predictive power of the explanatory 
variable, we require the values of sensitivity and specificity to exceed 50% 01 by a comfortable margin. 
Due to the composition of our sample set, partition refinement succeeds, by default, in 50% of the cases. 
So, if for a given model the values of sensitivity and specificity are just around 50%, then the model only 
confirms the threshold already being embedded in our sample data set. Hence, only if we achieve values 
for sensitivity and specificity greater than 50% will the corresponding explanatory variable become a 
suitable predictor for the success or failure of partition refinement. 

4.2 Impact of Size on Partition Refinement 

The size of an automaton has a direct impact on the running time of partition refinement of Component 
Interaction Automata specifications, as it is known to have exponential time and space complexity |[T6ll . 
Using the upper limit of 5 minutes, an observed suitable threshold, we can construct a logistic model 
that reliably predicts whether partition refinement will require more than 5 minutes based on the size, 
in terms of number of states \Q\, of an automaton. Figure [6] illustrates the model-specific values. We 
can expect partition refinement for systems with less than 200 states to always converge in less than 5 
minutes. However, we have also observed cases in which partition refinement converged for automata 
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in 5minsto converge' 



400 600 
Number of states 



(a) Partition refinement requires more than 5mins (b) The influence of \Q\ on running time 

Figure 6: A logistic model for the prediction of running time of partition refinement. 



with up to 450 states within that limit. But above 450 states, partition refinement is likely to require more 



than 5 minutes, as indicated in Figure 6(a) 



The corresponding logistic model (cf. Figure 6(b) I confirms these estimates. In fact, the number 
of states of an automaton \Q\ is a suitable predictor for the running time of partition refinement (i.e., 
X 1 = 342.68, 1 df. at 95% cf. with p-value of 0.0001). Sensitivity is at 87% and specificity at 99.8%. The 
model asserts that if an automaton reaches more than 385 states, partition refinement will require, with a 
probability of > 0.5, more than 5 minutes to converge. Moreover, the probability of partition refinement 
to require more than 5 minutes is 1 for automata with more than 570 states. 

However, size is not a suitable predictor for the success of partition refinement. We are unable 
to construct a corresponding model (x 2 = 0.01, 1 df. at 95% cf. with p-value of 0.928). Based on 
our analysis we find that the size of an automaton and success of partition refinement are independent 
variables. Partition refinement may succeed or fail independent of the actual size of the automaton. 
However, this does not mean that we can ignore size when considering the success of partition refinement. 
There appears to be a natural resistance to successful partition refinement when automata increase in size. 



4.3 Impact of Structure on Partition Refinement 

The size of an automaton does not yield a good predictor for the success of partition refinement. But 
what is the impact of structure on partition refinement, in particular with respect to a successful state 
space reduction? 

We have selected three topological attributes: j3, Gin, and Gout- Does the ratio between states 
and transitions in terms of the power-scaling relationship \8\ ~ \ Q\P provide us with a predictor for the 
success of partition refinement? The answer is yes. Consider Figure[7]fhat presents our observations for 
the scaling exponent j3. There is a significant difference in the distribution of the successes and failures 



of partition refinement (cf. Figure 7(a) I. We find that automata with higher /3 values are less likely 
candidates for successful partition refinement than those with smaller j3 values. The mean value of /3 for 
success is jU^ = 1.29, whereas that for failure is /i^ = 1.43. Based on these observations it appears that 

partition refinement becomes more likely to succeed | op below the mean value jXp and the chances for 
success diminish increasingly | Op above of the mean value jXp . 

The logistic model (cf. Figure 7(b) I confirms this. The scaling factor /3 yields a good predictor for 
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Figure 7: A logistic model for the prediction of success of partition refinement based on j3. 



the success of partition refinement (i.e., % 2 = 279.61, 1 df. at 95% cf. with p-value of 0.0001). Both the 
sensitivity and the specificity are at 71%, suggesting that the model is strong in detecting the probability 
of success (sensitivity) and the probability of failure (specificity), respectively. The model corroborates 
the view that if the scaling exponent /3 increases the probability for a successful partition refinement 
decreases. 

There is another intriguing aspect to this model. As the Component Interaction Automata specifica- 
tions for real-world systems mature the scaling exponents plateaux at the mean value Hp = 1.36. This 
is exactly the value at which the model predicts the probability for the success of partition refinement to 
be 0.5. As a result, this suggests that one in two Component Interaction Automata specification for real- 
world software systems can be reduced by partition refinement. In fact, the odds are slightly in favor of 
success for partition refinement of real-world software system specifications (cf. Grindstead and Snell's 
anecdote of Chevalier de Mere's rolling dice bet |22|). The probability for success is actually somewhat 
above 0.5, as rounding of ;Ur pushes its value up. 

The concentration of incoming transitions G/# fails to serve as a predictor for the success of partition 
refinement (i.e., % 2 = 1.76, 1 df. at 95% cf. with p-value of 0.1847). However, this does not come 
as a surprise. The Gini coefficient for incoming transitions appears to be independent of the modeled 
behavior. As far as Gin is concerned, the success of partition refinement cannot be predicted. From the 
point of partition refinement, it matters more how many behavioral variants a state can produce than how 
many behavioral variants a state depends on. 

Finally, we explored the concentration of outgoing transitions Gout- Unlike Gin, Gout furnishes 
us with a suitable predictor for the success of partition refinement. Even though the margin is small, 
it is sufficiently significant to provide us with a discriminator for the success of partition refinement. 
Partition refinement is more likely to fail if the value of Gout moves towards 0.34. In contrast, if the 
value of Gout moves closer to 0.43, partition refinement is more likely to succeed (cf. Figure 8(a) I. 
There is a narrow margin, ±4%, that determines the success or failure of partition refinement. This value 
is of specific significance, as it corresponds exactly to the threshold defined by Vasa et al. ll25ll for the 
identification of major shifts in evolving software systems. In other words, a deviation from the mean 
value P-Gout by more than 4% significantly influences the success of partition refinement. 

We can construct a logistic model using Gout X 2 - 148.63, 1 df. at 95% cf. with p-value of 
0.0001). However, sensitivity and specificity are not as strong as in the case of the model for /3. In fact, 
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Gini coefficients of outgoing transitions 



(a) Partition refinement is successful? (b) The influence of Gout- 

Figure 8: A logistic model for the prediction of success of partition refinement based on Gout- 



the model is stronger at detecting failure (specificity at 69%) and weaker at detecting success (sensitivity 
at 57%) of partition refinement. Nevertheless, it is still more reliable than a simple guess. 

The models for j8 and Gout offer quite opposite directions. Partition refinement is expected to more 
likely succeed if the automaton exhibits a low j8 value and a high Gout value. In other words, a large 
amount of behavioral choices in selected states can assist partition refinement to produce a smaller state 
space. However, the number of these behavioral choices has to be balanced with the total amount of 
choice points in an automaton, as indicated by the model for j8 . 



5 Conclusion 

Component Interaction Automata provide a fitting technique to capture and analyze the temporal facets 
of hierarchical-structured component-based systems. It is, however, in the nature of automata-based 
approaches that the respective specifications suffer from a combinatorial state explosion problem. For 
this reason, an effective use of Component Interaction Automata for the specifications and analysis of 
real-world software systems may become difficult, if not impossible, due to the underlying complexity 
of the systems being modeled. We, therefore, seek to find suitable abstraction methods that can help us 
to cope with the state explosion problem. 

Partition refinement through weak bisimulation can alleviate the impact of state explosion, but this 
technique too exhibits exponential time and space complexity [16]. Worse, success is erratic. To better 
understand why this abstraction technique succeeds in some cases and fails in others, we have conducted 
an empirical study on 1,680 Component Interaction Automata specifications and constructed several 
logistic regression models that can explain the observed performance of partition refinement. We learn 
that structure, not size, has a bigger impact on the success of partition refinement. However, we cannot 
completely dismiss size as a contributing factor to the success of partition refinement. Eventually, the 
partition refinement algorithm will succumb to the size of an automaton. Even though the topology of 
an automaton can positively influence of the outcome of partition refinement, we must not neglect size 
altogether as it affects the running time of partition refinement. 

Partition refinement can achieve excellent results and yield strong state space reduction ratios (cf. 
Figure [9]). However, the results depend on the presence of "synchronization cliques", community struc- 
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Internal synchronizations reduction ratio in percent 



Internal synchronizations reduction ratio in percent 



(a) Likelyhood of achieving 50% state space reduction (b) Likelyhood of achieving 75% state space reduction 

Figure 9: The effectiveness of partition refinement. 



tures that partition refinement can eliminate in the refinement process (cf. Figure [5]). These cliques have 
to be of sufficient size to have an impact. For example, we need to be able to remove 84% or more 
internal synchronizations from an automaton in order to achieve an overall state space reduction ratio 



of 75% or more (cf. Figure 9(b) I. But again, the internal synchronizations must be of the right kind - 
members of cliques. Simply having many internal synchronizations occurring in an automaton does not 
suffice - they have to occur in the right structural artifacts. It is the structure, not size, that influences 
most the outcome of the refinement process. 

The application of Component Interaction Automata for the specification and analysis of component- 
based software systems is similar in character to a "non-cooperative game" |[T8l . There are competing 
forces at work that need to be balanced in order to achieve the desired outcome. Automata-based tech- 
niques can be used for the specification of real-world software systems, but the level of granularity, in 
terms of both structure and size, has to be chosen carefully to compensate for the inherent and inevitable 
associated state space explosion. 
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